Co-training and Self-training for Word Sense Disambiguation
نویسنده
چکیده
This paper investigates the application of cotraining and self-training to word sense disambiguation. Optimal and empirical parameter selection methods for co-training and self-training are investigated, with various degrees of error reduction. A new method that combines cotraining with majority voting is introduced, with the effect of smoothing the bootstrapping learning curves, and improving the average performance.
منابع مشابه
Self-training and co-training in biomedical word sense disambiguation
Word sense disambiguation (WSD) is an intermediate task within information retrieval and information extraction, attempting to select the proper sense of ambiguous words. Due to the scarcity of training data, semi-supervised learning, which profits from seed annotated examples and a large set of unlabeled data, are worth researching. We present preliminary results of two semi-supervised learnin...
متن کاملReentrenamiento: Aprendizaje Semisupervisado de los Sentidos de las Palabras
This paper presents re-training, a bootstrapping algorithm that automatically acquires semantically annotated data, ensuring high levels of precision. This algorithm uses a corpus-based system of word sense disambiguation that relies on maximum entropy probability models. The re-training method consists of the iterative feeding of training-classification cycles with new and high-confidence exam...
متن کاملLatent Semantic Word Sense Disambiguation Using Global Co-occurrence Information
In this paper, I propose a novel word sense disambiguation method based on the global co-occurrence information using NMF. When I calculate the dependency relation matrix, the existing method tends to produce very sparse co-occurrence matrix from a small training set. Therefore, the NMF algorithm sometimes does not converge to desired solutions. To obtain a large number of co-occurrence relatio...
متن کاملWord Sense Disambiguation Using Vectors of Co-occurrence Information
This paper reports on the word sense disambiguation of Korean noun by using co-occurrence information in context. For a given noun, its local contextual word distribution is not enough to express their semantic characteristics for noun sense disambiguation. This paper proposes a cluster-based sense as a base vector. Contextual noise is removed by a term weighting method, and hypernyms of remain...
متن کاملWord Sense Disambiguation using Static and Dynamic Sense Vectors
It is popular in WSD to use contextual information in training sense tagged data. Co-occurring words within a limited window-sized context support one sense among the semantically ambiguous ones of the word. This paper reports on word sense disambiguation of English words using static and dynamic sense vectors. First, context vectors are constructed using contextual words 1 in the training sens...
متن کامل